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ABSTRACT 

The CellLineNavigator database, freely available at 
http://www.medicalgenomics.org/celllinenavigator, 
is a web-based workbench for large scale compari- 
sons of a large collection of diverse cell lines. It aims 
to support experimental design in the fields of 
genomics, systems biology and translational bio- 
medical research. Currently, this compendium 
holds genome wide expression profiles of 317 differ- 
ent cancer cell lines, categorized into 57 different 
pathological states and 28 individual tissues. To 
enlarge the scope of CellLineNavigator, the 
database was furthermore closely linked to 
commonly used bioinformatics databases and 
knowledge repositories. To ensure easy data 
access and search ability, a simple data and an in- 
tuitive querying interface were implemented. It 
allows the user to explore and filter gene expres- 
sion, focusing on pathological or physiological con- 
ditions. For a more complex search, the advanced 
query interface may be used to query for (i) differen- 
tially expressed genes; (ii) pathological or physio- 
logical conditions; or (iii) gene names or functional 
attributes, such as Kyoto Encyclopaedia of Genes 
and Genomes pathway maps. These queries may 
also be combined. Finally, CellLineNavigator allows 
additional advanced analysis of differentially 
regulated genes by a direct link to the Database 
for Annotation, Visualization and Integrated 
Discovery (DAVID) Bioinformatics Resources. 



INTRODUCTION 

In vitro cancer cell culture experiments provide the oppor- 
tunity of analysing and modelling the complex mechan- 
isms of tumour biology through facile experimental 



manipulations, global as well as detailed mechanistic 
studies. They are, therefore, of significant aid in molecular 
biomedical research. A crucial role of cancer cell lines for 
medical, scientific and pharmaceutical institutions was 
elucidated by systematic analysis on lung cancer cell 
lines (1,2). They revealed not only the amazingly 
complex role of the cancer genome but also identified 
and characterized driver mutations in those cell lines. 
Further studies on cancer cell lines lead to the character- 
ization of tumor protein 53 (TP53) and the understanding 
of multiple genetic mutations, mutant allele-specific imbal- 
ances and copy number losses in cancer (3-5). Moreover, 
the ability to translate these findings to clinical applica- 
tions had led to rational therapeutic drug selection (6). 
For example, activating mutations in the epidermal 
growth factor receptor (EGFR) kinase domain have 
major clinical implications in lung cancer, and it was 
shown in cell line experiments that tumours with this 
mutation are sensitive to tyrosine kinase inhibitors (7). 
However, repeatedly a varying response to treatment or 
targeted manipulation of gene expression was observed in 
diverse cancer cell lines. This was attributed to a diverse 
genetic background and, subsequently, a diverse gene ex- 
pression. Thus, information on these diverse gene expres- 
sion profiles in cancer cell lines may be crucial to 
experimental designs of modelling cancer in vitro and 
testing for novel therapeutic approaches. 

We have, therefore, generated CellLineNavigator, a 
workbench for the biomedical community, which allows 
querying the transcriptom of a great variety of cancer cell 
lines to screen for the most suitable cell line for upcoming 
experiments. To enlarge the scope of this database, the 
data were linked to common functional and genetic data- 
bases, enabling querying for a more systematic view on 
cell line expression profiles. 

In summary, we have generated a comprehensive 
database containing expression profiles of 317 cancer cell 
lines representing 57 different pathological states and 28 
individual tissues. This database will aid the design of 
in vitro experiments in cancer research, as it will allow 
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taking the genetic background of these cell lines into con- 
sideration. The CellLineNavigator database is publicly 
available at http://www.medicalgenomics.org/celllinenavi 
gator/. 

MATERIALS AND METHODS 

Data source, data processing 

Genome-wide expression data of multiple cell lines, freely 
available at ArrayExpress [database ID: E-MTAB-37 (8)], 
were publicly provided by Greshock et al. (Laboratory of 
Cancer Metabolism Drug Discovery, GlaxoSmithKline, 
Collegeville, PA, USA). The cell lines were handled as 
previously described (9). Briefly, the transcript abundance 
of 3 1 7 cancer cell lines was analysed using the Affymetrix 
Human Genome-U133 Plus2 GeneChip technology. This 
chip covers the complete human genome for analysis of 
>45 000 transcripts and > 19 000 genes. All data were 
available in technical triplicates. Corresponding informa- 
tion on tissue site and disease state was supported for each 
cell line (Figure 1). 

The differential expression was analysed using the 
R-Project (10)/bioconductor (11) suite with the following 
additional libraries: 'affy' (12), 'hgul33plus2.db' (13) and 
'frma' (14,15). After quality control, two microarray 
experiments (cell line SNU398 — Replicate 1 and cell line 
SNU423 — Replicate 2) were neglected for further analysis 
because of insufficient RNA level detection. All data were 
normalized using the 'expresso' function of the 'affy' 
package and following settings: background adjustment 
method: 'mas', normalization method: 'quantiles', 
PerfectMatch (PM) adjustment method: 'mas' and the 
method used for the computation of expression values: 
'medianpolish'. Next, we calculated the expression 
median for each probe set for all cell lines. These values 
were subsequently used as control to calculate log2 trans- 
formed expression ratios (M-values), after the median ex- 
pression was calculated for each cancer cell line. M-values 
representing the expression levels of tissue sites and disease 
states were calculated accordingly. Gene expression 
barcodes were generated using the 'frma' (frozen robust 
multiarray analysis) (default options) and 'barcode' 
(output: Z-score) function implemented in the 'frma' 
package. A frma Z-score of >5 suggested that a gene 
is expressed in a particular tissue. The frma Z-score 
was generated to allow comparison of the expression 
profiles with data already present at medicalgenomics.org 
(16,17) and other microarray data sets processed with the 
frma method. Official gene symbols and National Center 
for Biotechnology Information (NCBI) Entrez GenelDs 
were assigned to the data using the 'hgul33plus2.db' 
package. 

To enable an integrative comparison and querying 
between gene expression and biological function informa- 
tion, all data were linked to commonly used and estab- 
lished bioinformatics databases and knowledge 
repositories, such as NCBI Entrez database (18), HUGO 
Gene Nomenclature Committe (HGNC) (19), Human 
Protein Reference Database (HPRD) (20), Online 
Mendelian Inheritance in Man (OMIM) (21), BioGPS 



(22), Nextbio (23) and Gent (24). Moreover, the Kyoto 
Encyclopaedia of Genes and Genomes (KEGG) (25) was 
connected to identify gene signalling and molecular 
pathway associations. Data on cellular component, biolo- 
gical process and molecular function were collected from 
the Gene Ontology database (26). Finally, the 
CellLineNavigator was cross-linked to our RNA-Seq ex- 
pression profiling database on normal tissues, RNA-Seq 
Atlas (16), and our liver-specific Library of Molecular 
Associations [LoMA (17)]. 

Data organization and Webinterface 

The backbone of CellLineNavigator is a Linux-Postgre 
SQL-Apache-PHP stack implemented in a content 
management system (Drupal: http://drupal.org/). The 
database organization is founded on a menu, allowing to 
directly accessing the following sections: news, data, 
search, download and help section. 

Information on current statistics and recent changes 
were posted in the news section to keep the users up-to- 
date, whereas the download section provides the possibil- 
ity to download the complete CellLineNavigator database 
in tab separated text file format. 

CellLineNavigator may easily be accessed through a 
simple (data section) or advanced querying interface 
(search section). 

The data section offers the possibility to explore tissues 
(default) or disease states (Figure 2). Filtering options for 
expression levels within individual tissues or disease states 
are supported. The default filter is set to list all genes with 
a different expression level of at least 2-fold in comparison 
with the respective control. Six additional levels of expres- 
sion filtering are supported (from 1.5- to 5-fold). However, 
the user may also set the filter criteria to none (no filter) or 
no regulation (list all genes whose M-values are in the 
range of —1 to +1). 

To allow users a high degree of flexibility to access 
CellLineNavigator, we implemented an advanced search 
section, offering the user 'Fulltext search' or 'Explore 
profile' options (Figure 3). The 'Fulltext search' may be 
used to query for individual genes provided by the user to 
query for expression levels within specific cell lines, tissues 
or disease states (or any combination of all). Using the 
'Explore profile' query option, the user may query for 
specific expression levels classified in the fields of 
(i) genes (ii) KEGG pathway maps (iii) gene ontologies 
(iv) cell lines (v) tissues or (vi) disease states. Again, a 
combination of all query types is possible. Moreover, the 
user may also define cut-off criteria to filter for specific 
expression levels. The resulting gene list is shown in an 
interface providing the same features mentioned in the 
data section with one exception, the filter criteria is 
adjusted to the preceding query. These features may 
again be used for further filtering the resulting gene list. 

To allow users a more customizable way of displaying 
the expression results, an extra option for setting the regu- 
lation view is supported (default: 2-fold). 

Moreover, a powerful resource within our database 
extending the full impact of the individual gene-cell line 
relations is provided in the details section (Figure 4). This 
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Distribution of tissue sides 




section may be accessed by either clicking on the detail 
link or the specific expression icon in the results tables. It 
provides additional information on gene symbol, descrip- 
tion, aliases, chromosomal location, Entrez ID, Ensembl 
ID, Gene Ontology, KEGG pathway and expression 
profiles. Although the expression profiles were indivi- 
dualized to the previous user query, for example, did the 
user click on the expression icon of tissue side 'bladder', 
the details view will show a barchar with an overview of 
the expression within all tissues and, more importantly, 
with a barchar representing the specific expression values 
of the cell lines corresponding to the tissue of interest. For 
further comparison with already available data at 
medicalgenomics.org, such as RNA-Seq Atlas, the 
details view may be switched from M-value to frma 
Z-score representation. 

Finally, a major strength of the database is its direct 
connection to the Database for Annotation, Visualization 
and Integrated Discovery (DAVID) Bioinformatics 
Resources (27). Gene lists generated in CellLine 
Navigator may automatically be transferred to the 
DAVID analysis tools. 



DISCUSSION 

The current scope of biomedical studies on tumorigenesis 
requires a tremendous amount of human tumour material. 
Stringent restrictions on the international exchange of bio- 
logical reagents and increasing requirements from insti- 
tutes, ethics committees and government are limiting the 
availability of those human tumour materials. Thus, 
in vitro cell culture is highly useful for modelling the 
complex mechanisms of cancer development to identify 
molecular mechanisms related to tumour development 
and potential therapeutic targets. 

Cell lines are capable of infinite replication and, there- 
fore, offer an unlimited source of biomedical material that 
can be distributed to laboratories worldwide and thus, 
allow direct comparison of research results if originating 
from identical material. As a matter of fact, these cell lines 
are widely used in biomedical research. However, the 
detailed knowledge about their genetic profiles is still 
limited and has not been summarized in a large compara- 
tive database. 

Diverse biological behaviour of cancer cell lines may 
result from diverse underlying genetic profiles and 
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Figure 2. The data section offers the possibility to explore tissues or disease states. In default, the user will get a list of all genes that have shown a 
different expression level of at least 2-fold in comparison with the respective control within the specific tissues. The set view option allows the user to 
switch between tissues and disease state views. Additional six filter options for expression levels are supported (default 2-fold). Further, the user can 
set the expression filter criteria to none (no filter) or no regulation (list all genes whose M-values are in the range of — 1 to +1). To allow users a more 
customizable way in displaying the data, the user may change the cut-off criteria for differentially expressed genes (default: 2-fold). 



expression signatures, which may differ significantly 
among immortalized tumour cell lines. The awareness of 
these differences makes it useful/necessary to take the 
diverse gene expression signatures into account, especially 
while planning targeted strategies to influence the biolo- 
gical behaviour. Large scale microarray experiments to 
unravel the genetic profile of these cell lines are available 
through public databases, such as ArrayExpress (8) of 
Gene Expression Omnibus (28). 

However, the analysis of these data is hardly feasible for 
biologists or physicians without substantial bioinformatics 
skills or at least knowledge on microarray analysis. Even 
with profound experience in microarray technology, 
analysis of such data is complex and time consuming 
task. So far to our knowledge, the Gene Expression 
Atlas (8) is the only database that provides access to 



these cell line expression profiles. However, the main 
focus of this database is not on cancer cell lines, and 
thus, it just contains the expression profile of ~90 cell 
lines from various species. Moreover, not only the classi- 
fication into specific phenotypes but also data collected 
from multiple laboratories are incomplete and, therefore, 
exhibit multiple experimental conditions, making a com- 
parison between the multiple expression profiles extremely 
difficult. 

The database, CellLineNavigator, presented here 
contains gene expression profiles of >300 human cancer 
cell lines. These expression profiles were generated in the 
same laboratory under nearly the same experimental 
conditions and thus, guarantee a highest degree on com- 
parability. Further, depending on phenotypic informa- 
tion, these cell lines were classified into corresponding 
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Figure 3. The search section allows users to choose between the 'Fulltext search' or 'Explore profile' option. In the 'Fulltext search', the user can 
provide a gene list that can be queried for expression levels within specific cell lines, tissue sides or disease states (or any combination of all). The 
'Explore Profile' allows the user to query for specific expression levels classified in the fields (i) gene, (ii) KEGG pathway maps, (iii) Gene Ontology, 
(iv) cell line, (v) tissue side or (vi) disease state. A combination of all query types is possible. Additionally, the user may define a cut-off criteria to 
filter for specific expression levels. 
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Figure 4. The details section offers a powerful option to access the full impact of the individual gene-cell line relation. The expression profiles are 
individualized to the previous user query, for example, if the user is interested in the expression of the tissue 'bladder', the details view display a 
barchar showing an overview of the expression within all tissues and, more importantly, with a barchar representing the specific expression values of 
the cell line(s) corresponding to the to the tissue 'bladder'. Further, additional information on gene symbol, description, aliases, chromosomal 
location, Entrez ID, Ensembl ID, Gene Ontology and KEGG pathway are supported. 



tissues of origin and disease states. The main focus of 
CellLineNavigator is not simply on summarizing these 
data but rather on an easy and user friendly availability 
as well as the linkage to advanced bioinformatics analyses 
tools. To guarantee easy data access and connectivity, we 
implemented a mostly self-explaining Web application as 
a user friendly front end to the data base. This Web ap- 
plication allows users to query for (i) differentially 
expressed genes; (ii) pathological (e.g. melanoma) or 
physiological (e.g. lung) conditions or (iii) gene names or 
functional attributes, such as KEGG pathway maps. 
A combination of all query types is possible. 

Comparative analysis of differential gene expression 
between cell lines or diseases of interest will initially 



often result in (large) lists of genes being differentially 
regulated. To further characterize the differences 
between the respective samples and thus a major 
advance in the usability of this database, these collections 
of genes need to be further characterized with respect to 
functional or structural similarities. We, therefore, chose 
to link and provide an automated data transfer of gene 
lists of interest to DAVID, a large bioinformatics suite 
providing functional and structural analyses, such as 
pathway enrichment, gene ontology enrichment or 
analysis of functional domains. This automated linkage 
to DAVID brings our database resource and analysis 
tool to the next level of not only comparing genetic 
changes but also functionally and structurally 
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characterizing the differences by means of advanced bio- 
informatics. In summary, CellLineNavigator is the first 
database providing comprehensive summary, display and 
analysis options for gene expression data of the most 
commonly used cancer cell lines. It provides access to 
large microarray data sets without advanced bioinfor- 
matics skills. Thus, CellLineNavigator may be of signifi- 
cant aid for in vitro modelling of cancer mechanisms and 
testing of novel therapeutic approaches. 
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